ARCHES: AI Research Considerations for Human Existential Safety

PDF: http://acritch.com/papers/arches.pdf

BIBTEX: @article{critch2020ai, title={{AI Research Considerations for Human Existential Safety (ARCHES)}}, author={Critch, Andrew and Krueger, David}, year={2020}, journal={Preprint at \href{http://acritch.com/arches}{acritch.com/arches}}}

ABSTRACT:

Framed in positive terms, this report examines how technical AI research might be steered in a manner that is more attentive to humanity’s long-term prospects for survival as a species. In negative terms, we ask what existential risks humanity might face from AI development in the next century, and by what principles contemporary technical research might be directed to address those risks.
A key property of hypothetical AI technologies is introduced, called prepotence, which is useful for delineating a variety of potential existential risks from artificial intelligence, even as AI paradigms might shift. A set of twenty-nine contemporary research directions are then examined for their potential benefit to existential safety. Each research direction is explained with a scenario-driven motivation, and examples of existing work from which to build. The research directions present their own risks and benefits to society that could occur at various scales of impact, and in particular are not guaranteed to benefit existential safety if major developments in them are deployed without adequate forethought and oversight. As such, each direction is accompanied by a consideration of potentially negative side effects.
Taken more broadly, the twenty-nine explanations of the research directions also illustrate a highly rudimentary methodology for discussing and assessing potential risks and benefits of research directions, in terms of their impact on global catastrophic risks. This impact assessment methodology is very far from maturity, but seems valuable to highlight and improve upon as AI capabilities expand.

My goal in coauthoring ARCHES has been twofold:

to represent the diversity and complexity of ways in which a wide variety of AI research directions relate to existential safety, and
to introduce the concept of prepotence, a property of hypothetical AI systems that is weaker than human-level AI, AGI, or superintelligence, but which is adequate to pose a substantial existential risk. Roughly speaking, prepotent AI is AI that is both (globally) transformative and impossible or very difficult to “turn off” via human-coordinated efforts. (A more precise definition and discussion is given in the document.)

ARCHES is very far from an authoritative or final statement on how to achieve AI existential safety, but I hope it will do some good in advancing discourse in AI safety and AI ethics to include AI existential safety as a less-ambiguously-defined area of positive interest. Enjoy!